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Introduction 


The  overriding  goal  of  this  project  has  been  to  develop  a  general 
framework  for  representation  of  item  responses  which  can  be  used  to 
represent  data  in  applications  such  as  mastery  tests  and  other  kinds  of 
achievement  tests,  where  there  is  reason  to  believe  that  current 

i  ,.T?  *  \ 

foundations  are  deficient.  The  strategy  which original ly  proposed 
for  pursuing  this  goal  involved  building  a  model  for  signed-number 
addition  test  data  gathered  by  Tatsuoka  and  Birenbaum  (1979).  They 
have  shown  that  this  data  cannot  be  represented  by  a  unidimensional 
model  because  of  a  number  of  systematic  error  patterns  exhibited  by 
different  subgroups  of  students.  The  immediate  subgoals  of  the  project 
have  been  to: 

1,  validate  a  finite  latent  state  model  which  1  developed  to 
account  for  this  data; 

2;  extend  this  model  to  deal  with  change  over  time;  and 
3;  develop  optimal  procedures  based  on  the  model  for  testing 
mastery  of  the  signed-number  addition  concept.  **  '■  ■ 

The  first  of  these  subgoals  has  taken  more  time  to  reach  than 
anticipated,  but  pursuit  of  it  has  yielded  results  of  more  general 
applicability  than  I  had  originally  hoped  to  obtain.  These  results, 
which  I  described  at  the  October  1984  ONR  Contractors1  Meeting  at  ETS, 
will  be  discussed  further  in  the  next  section  of  the  report.  The 
section  after  that  will  treat  extensions  of  these  results  to  models 
which  impose  monotone  homogeneity  constraints  on  the  item  parameters. 
The  extensions  are  important  because  they  serve  to  explicity  relate  the 
general  latent  class  model  representation  to  standard  item  response 
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theory  representations  of  test  data  and  provide  a  basis  for  deciding 
whether  or  not  the  latter  representations  are  approriate  for  a  given 
set  of  data.  The  final  section  of  the  report  will  describe  some 
preliminary  results  concerning  extensions  of  the  model  to  deal  with 
change  over  time,  simultaneous  modelling  of  more  than  one  response 
component,  and  some  of  the  implications  of  these  results  for  testing 
procedures  based  on  the  model. 

Latent  Class  Models  for  Item  Responses 

In  order  to  validate  the  finite  latent  state  models  which  I  had 
developed  for  the  signed-number  addition  data,  it  occurred  to  me  that 
it  would  be  nice  to  formulate  a  more  general  model  which  would  include 
my  models  as  special  cases.  Then,  if  reasonable  estimation  procedures 
and  goodness-of-f it  indices  could  be  devised  for  the  general  model,  it 
would  be  possible  to  answer  a  number  of  questions  about  the  validity  of 
specific  models.  It  occurred  to  me  that  Lazarsfeld's  Latent  Class 
Structure  models  would  include  my  models  as  special  cases.  However, 
there  were  problems  with  estimating  parameters  in  latent  class  models 
which  seemed  to  limit  the  applicability  of  methods  associated  with  them 
to  my  problems.  The  complexity  of  existing  approaches  grows 
exponential ly  with  the  number  of  items.  Ten  items  would  be  considered 
a  lot;  I  was  dealing  with  twenty-item  tests. 

Since  many  of  the  interesting  implications  of  my  model  concern  the 
structure  of  interitem  correlations,  I  decided  to  try  to  estimate 
parameters  by  fitting  covariance  matrices,  in  the  spirit  of  Joreskog's 
analysis  of  covariance  structures.  I  developed  the  necessary 
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theoretical  formulas  and  some  computer  programs  to  implement  this 
generalized  least  squares  approach  and  presented  them  at  the  October 
1983  ONR  Contractors'  Meeting  at  the  University  of  Illinois.  I  noted 
various  difficulties  in  getting  the  algorithms  to  converge  and  outlined 
a  quasi-Newton  algorithm  which  I  hoped  would  circumvent  them. 

The  proposed  approach  to  parameter  estimation  was  greeted  with 
some  skepticism  at  the  Contractors  Meeting.  It  was  suggested  that  I 
re-examine  the  literature  on  maximum  likelihood  estimation  in  latent 
class  models.  I  was  not  eager  to  do  this,  for  reasons  alluded  to 
above,  but  it  did  seem  that  it  might  be  worth  pursuing  the  EM  approach 
being  used  by  Tsutakawa  and  Bock  on  other  problems.  I  did  this,  and  in 
effect,  wound  up  reinventing  Goodman's  (1974)  algorithm  for  constrained 
marginal  maximum  likelihood  estimation  in  latent  class  models,  but  with 
an  essential  modification  which  dramatically  extends  the  algorithm  to 
apply  to  tests  with  many  items. 

The  EM  approach  yields  a  particularly  simple  algorithm  in  the  case 
of  the  latent  class  model.  The  computations  on  each  iteration  are 
straightforward  because  of  the  finite  number  of  states.  In  the 
expectation  phase,  or  E-phase,  of  each  iteration  the  conditional  state 
probabilities,  given  the  trial  parameter  values  and  the  subject's 
responses,  are  apportioned  to  each  state  according  to  these  state 
probabilities.  Then,  in  the  maximization,  or  M-phase,  the  parameter 
values  are  revised  by  computing  estimated  "sample"  proportions  of 
subjects  in  each  state  and  estimated  "sample"  proportions  passing  each 
item,  given  the  state,  based  on  the  results  of  the  E-phase. 


One  of  the  difficulties  with  my  earlier  approach  to  estimation  was 
a  tendency  for  the  estimates  to  drift  outside  the  unit  interval  to 
which  they  are  constrained  by  the  fact  that  they  are  all 
probabilities.  These  constraints  are  always  automatically  satisfied  by 
the  present  algorithm.  Not  only  are  these  constraints  satisfied,  it  is 
easy  to  modify  the  algorithm  to  require  subsets  of  the  item  parameters 
to  be  equal  or  complementary  to  each  other.  These  additional 
constraints  are  also  automatically  satisfied  by  the  nature  of  the 
algorithm. 

When  the  maximum  likelihood  estimates  have  been  obtained,  it  is 
easy  to  compute  the  marginal  likelihood  of  the  data  as  a  whole.  By 
computing  likelihoods  under  hypotheses  imposing  different  constraints, 
one  can  perform  likelihood  ratio  tests  to  answer  a  variety  of 
questions.  When  these  tests  are  applied  to  the  signed-number  addition 
data,  the  specific  models  which  I  have  proposed  are  seen  to  give  a 
qualitatively  good  account  of  the  data,  but  they  are  wrong  on  some 
details.  For  example,  the  models  imply  that  items  within  types  should 
be  equivalent  in  the  sense  of  having  identical  parameters.  This 
equivalence  hypothesis  must  be  rejected.  The  models  imply,  that  in 
states  corresponding  to  systematic  response  patterns,  the  probabilities 
of  deviant  responses  are  the  same  for  all  item  types.  This  hypothesis 
must  also  be  rejected. 

While  the  null  hypotheses  must  be  rejected,  examination  of  the 
unconstrained  parameter  estimates  reveals  that  the  deviations  from 
these  hypotheses  are  relatively  minor.  If  only  small  samples  are 
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available  for  estimating  parameters,  as  is  the  case  here,  the  simpler 
constrained  models  probably  provide  a  more  robust  representation  of  the 
data  than  the  more  general  models. 

It  would  have  been  surprising  if  these  analyses  had  turned  out 
any  differently  than  they  did,  because  Yamamoto  (1983)  got  very  similar 
results  with  the  same  data  but  different  methods.  Besides  confirming 
Yamamoto's  results,  the  point  of  these  analyses  is  that  they 
demonstrate  the  use  of  a  much  more  flexible  approach  to  model 
development  questions  for  latent  class  models. 

During  December  9-21,  1984  I  participated  in  the  NATO  Advanced 
Study  Institute  on  Human  Assessment:  Advances  in  Measuring  Cognition 
and  Motivation,  in  Athens,  Greece.  I  presented  a  paper  entitled 
"Latent  Class  Representation  of  Systematic  Patterns  in  Test  Responses," 
which  was  basically  an  account  of  the  work  which  I  have  just  described 
above.  Since  then  I  have  expanded  the  paper  into  a  general  discussion 
of  latent  class  structure  as  a  framework  for  modeling  test  performance, 
using  signed-number  addition  models  to  illustrate  the  process  of  model 
development.  The  paper  Paulson  (1985),  will  be  published  in  Irvine, 
S.H.,  Newstead,  S.  and  Dann,  P.  (eds.)  Computer-Based  Human  Assessment, 
a  volume  of  selected  papers  from  the  ASI,  to  be  published  by  Nijhoff. 

It  is  also  being  distributed  as  a  technical  report  simultaneously  with 
this  Final  Report.  Five  questions  which  will  frequently  arise  in 
building  latent  class  models  are  treated  at  some  length  in  the  paper. 
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The  questions  are: 

1.  How  many  states  should  the  model  have? 

2.  Are  nominally  equivalent  items  really  equivalent? 

3.  Does  a  given  specific  parametric  model  hold? 

4.  Are  the  item  parameters  of  a  given  model  invariant  over  time? 

5.  Are  the  item  parameters  invariant  across  groups? 

Likelihood  ratio  tests  for  dealing  with  each  question  are  described  in 
detail.  It  is  easy  to  generate  these  tests  in  principle,  because  of 
the  ease  of  dealing  with  various  specifications  of  fixed,  equality,  and 
complementarity  constraints  in  t.he  estimation  algorithm. 

Monotone  Homogeneity  of  Items 

The  likelihood  ratio  principle  has  been  used  to  construct  a  wide 
variety  of  hypothesis  tests  relevant  to  the  development  of  latent  class 
models.  However,  one  issue  which  does  not  lend  itself  directly  to  such 
a  test  is  the  basic  question  of  whether  a  unidimensional  latent  trait 
model  might  adequately  account  for  a  given  data  set.  The  problem  is 
that  neither  model  is  nested  in  the  other:  the  umdimensional  model 
has  an  infinite  set  of  states,  whereas  a  finite  state  latent  class 
model  need  not  be  unidimensiona 1 .  One  way  out  of  the  problem  would  be 
to  estimate  ICC's  for  some  umdimensional  model,  such  as  the 
three-parameter  logistic  model,  discretize  9  at  a  finite  number  of 
points  sufficient  to  represent  the  curves,  use  the  resulting 

Pj(9k)'s  as  Pkj's  for  a  constrained  latent  class  model,  and  test 
to  see  if  a  more  general  latent  class  model  accounts  significantly 
better  for  the  data  than  the  discretized  unidimensional  model.  While 


7 


this  approach  may  be  a  good  way  to  test  the  fit  of  the  particular  model 
chosen,  it  is  not  an  adequate  test  of  umdimensional  models  in 
general.  Some  other  unidimensional  model  might  fit  fine,  if  the  model 
chosen  does  not.  A  better  approach  is  suggested  by  the  following 
observation. 

Suppose  that  we  estimate  conditional  probabilities  of  correct 
response  to  items,  given  state,  in  an  unconstrained  latent  class  model, 
and  find  that  the  ordering  of  the  j  *  s  is  the  same  for  all  items. 

That  is,  the  items  are  "monotonely  homogeneous"  in  the  term  used  by 
Charles  Lewis  (1985).  If  we  do,  it  strongly  suggests  that  an  adequate 
umdimensional  model  could  be  found.  However,  if  we  find 
instead  that  the  deviations  from  monotonicity  can  not  be  attributed  to 
sampling  variability,  it  implies  that  no  such  unidimensional  model 
can  be  found. 

Nonpar ametnc  estimation  of  monotonely  homogeneous  ICC's.  A 
simple  extension  of  the  algorithms  developed  to  deal  with  equality 
constraints  can  provide  marginal  maximum  likelihood  estimates  of  the 
parameters  in  a  latent  class  model,  subject  only  to  the  constraint  of 
monotone  homogeneity  of  the  item  parameters.  The  fact  that  the 
monotonely  constrained  model  is  nested  in  the  unconstrained  model  with 
the  same  number  of  states  leads  directly  to  a  likelihood  ratio  test  of 
monotone  homogeneity.  If  the  monotone  homogeneity  hypothesis  is 
acceptable,  the  constrained  parameter  estimates  for  each  state  plotted 
against  expected  number  of  items  correct,  given  state,  provides 
nonparametric  marginal  maximum  likelihood  estimates  of  the  ICC's. 
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Due  to  the  finite  number  of  states,  this  approach  can  only  yield  an 
approximation  to  the  ICC's.  However,  if  one  uses  enough  states  this 
should  not  be  too  much  of  a  problem.  I  think  it  would  be  very 
interesting  to  compare  this  approach  to  other  approaches  which  make  no 
assumptions  regarding  the  form  of  the  ICC.  The  approach  is  promising 
because  there  is  a  very  simple  way  to  accommodate  the  monotone 
hoineogeneity  constraint. 

The  "Up-and-Down  Blocks"  algorithm.  Consider  a  simpler  problem 
than  the  present  one.  We  have  responses  to  a  given  item  from 
individuals  in  a  series  of  groups,  and  we  assume  the  groups  fall  in  a 
known  order  with  respect  to  probability  of  correct  response  to  the 
item.  What  is  the  maximum  likelihood  estimator  of  the  set  of  group 
probabilities,  subject  to  the  ordering  constraint?  Without  the 
constraint,  the  MLE  is  just  the  set  of  sample  proportions  correct  in 
each  group.  If  the  sample  proportions  happen  to  fall  in  the  assumed 
order,  the  constraint  is  not  active  and  the  unconstrained  MLE  applies. 
If  the  sample  proportions  do  not  all  fall  in  the  prescribed  order,  then 
a  theorem  from  the  theory  of  isotonic  regression  says  how  the 
constrained  MLE  can  be  constructed  from  the  sample  proportions  by 
amalgamating  groups  into  level  sets  within  which  equality  constraints 
apply.  The  "Up-and-Down  Blocks"  algorithm  is  a  simple  procedure 
devised  by  Kruskal  (1964)  for  effecting  this  division  into  level  sets. 
These  developments  are  described  in  detail  by  Barlow,  Bartholomew, 
Bremner,  and  Brunk  (1972).  Since  my  program  can  handle  equality 
constraints,  and  the  Pjk's  yielded  by  the  unconstrained  phase  of  each 
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iteration  ut  my  |j t uyram  die  analogous  to  sample  proportions  correct  in 
the  respective  states,  the  extension  to  the  monotonely  homogeneous 
constraints  is  straightforward. 

The  test  of  monotone  homogeneity.  If  there  are  J  items  on  a  test 

and  one  is  fitting  an  unconstrained  latent  class  model  with  _s  states, 

then  there  are  Js  free  item  parameters  to  be  estimated.  Let  mj 

denote  tne  number  of  level  sets  determined  by  the  Up-and-Down  Blocks 

algorithm  for  item  j.  The  number  of  free  item  parameters  in  the  model 

with  the  monotone  homogeneity  constraint  is  then  V  m. .  Let.  L  and  L 
denote  me  maxima  of  the  likelihood  function  evaluated  under  the 

:■  s’ s:  .-.til  mono  .one  I  y  cons •'<$  1  neo  hypothesis,  respectively.  If 

t.ho  mmijt  uni.-  Homogeneity  hypothesis  is  correct,  then  asymptotically  the 

likelihood  • at  1C  test  Statistic 


-J  log  >  -  2(  log  -  log  Lj 


has  a  chi  squared  distribution  with  Js 


degrees  of  freedom. 


This  fact,  can  be  used  to  set  up  critical  regions  for  tests  of  the 
hypothesis.  A  detailed  discussion  of  the  extension  of  the  EM  approach 
to  deal  with  monotone  homogeneity  constraints  is  given  in  Paulson 
(1986),  a  technical  report  which  is  being  distributed  simultaneously 
with  this  Final  Report. 
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Some  Important  Technical  Questions 

This  section  describes  the  results  of  some  preliminary  analyses 
which  might  help  answer  the  following  questions  regarding  signed-number 
addition  test  performance: 

1.  Are  items  parameters  invariant  from  one  testing  to  the  next? 

2.  Are  the  states  into  which  subjects  are  classified  on  different 
response  components  related?  If  so,  can  a  simple  model  be 
found  relating  the  distribution  on  the  joint  classification  to 
the  marginal  distributions  on  the  separate  components? 

3.  How  do  subjects  move  from  state  to  state  during  the  course  of 
learning? 

If  the  item  parameters  are  invariant  over  time,  then  changes  in 
performance  can  be  interpreted  as  transitions  between  states;  if  they 
are  not,  the  interpretation  of  change  is  problematical.  Even  if  the 
changes  in  parameters  over  time  are  relatively  minor  deviations  which 
do  not  affect  the  qualitative  interpretations  of  the  states,  parameter 
dependent  statistical  procedures  for  characterizing  test  performance 
might  be  adversely  affected  by  them. 

In  a  completely  satisfactory  componential  model  for  test 
responses,  the  number  of  states  needed  to  characterize  the  responses  is 
the  product  of  the  numbers  of  states  in  the  models  for  the  respective 
components.  Accurate  estimation  of  parameters  in  the  comprehensive 
model  is  not  likely  to  be  feasible  unless  a  simple  model  relating  the 
joint  distribution  over  states  to  the  marginal  distributions  can  be 
found. 
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The  question  regarding  the  transitions  between  states  which 
subjects  make  during  the  course  of  learning  makes  sense  even  if  the 
nature  of  the  states  changes  from  points  early  in  learning  to  points 
late  in  learning.  Which  transitions  occur  most  often  might  well  have 
pedagogical  significance.  It  may  also  have  theoretical  implications 
for  methods  of  assessing  change. 

The  data  to  be  presented  in  addressing  these  questions  comes  from 
a  panel  of  Junior  High  School  students  in  Urbana,  Illinois  who  were 
studied  by  Tatsuoka  and  Birenbaum.  Most  of  them  took  their  first 
signed-number  arithmetic  test  at  the  same  time  as  the  students 
discussed  by  Tatsuoka  and  Birenbaum  (1979),  whose  data  I  analyzed  in 
detail  in  Paulson  (1985).  When  first  tested,  the  students  had  only 
received  a  small  amount  of  experimental  instruction  on  signed-numbers. 
As  was  expected,  many  of  them  still  did  not  understand  signed-number 
addition  after  this  brief  exposure.  The  panel  of  students  was  next 
tested  at  the  beginning  of  regular  classroom  instruction  on 
signed-numbers,  after  an  interval  of  some  weeks.  Thus,  the  second  test 
was  essentially  a  retention  test.  There  is  data  on  59  students  at  this 
second  testing.  Two  of  our  analyses  involve  data  on  the  second  test;  a 
third  analysis  involves  the  relationship  between  performance  on  the 
first  and  second  tests.  There  is  also  data  available  for  many  of  these 
subjects  from  two  tests  later  in  instruction.  This  data  will  not  be 
presented  here  in  detail,  because  the  number  of  subjects  who  did  not 
master  signed-number  addition  before  the  later  tests  was  too  sm:ll. 

Parameter  invariance.  Table  1  gives  parameter  estimates  based  on 


data  on  the  magnitude  response  component  from  subjects  on  the  second 
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test  for  two  different  models.  Both  models  assume  that  all  items  of  a 
given  type  have  identical  parameter  values.  The  first  model  constrains 
the  item  parameters  to  be  equal  to  the  estimates  based  on  the  data  from 
the  first  testing.  Only  the  parameters  giving  the  distribution  of 
subjects  over  states  are  reeestimated  using  the  data  from  the  second 
test.  The  second  model  reestimates  all  the  parameters.  The 
likelihood-ratio  chi-squared  statistic  for  testing  the  hypothesis  that 
the  second  set  of  item  parameters  is  identical  to  the  first  set  is 
highly  significant:  x2(25)=69.46,  p<.001.  Hence,  the  hypothesis  of 
parameter  invariance  must  be  rejected.  Examination  of  Table  1  reveals, 
however,  that  none  of  the  differences  between  the  item  parameters 
obtained  on  the  two  occasions  affects  the  qualitative  interpretations 
of  the  patterns  of  responses  to  different  item  types  in  the  various 
states.  All  of  the  differences  are  less  than  .20  and  the  only 
parameter  values  which  change  from  less  than  .50  to  greater  than  .50, 
or  vice  versa,  arc  those  which  fall  in  the  .40-. 60  range  for  both 
models. 


Table  1  about  here 

Some  of  the  differences  which  contribute  to  the  significant 
chi-squared  statistic  are  the  following.  In  the  model  with  item 
parameters  constrained  to  equal  their  values  on  the  first  test, 
relatively  more  of  the  subjects  would  be  classified  as  belonging  to  the 
random  response  state  and  relatively  fewer  to  the  systematic  response 
states  than  would  be  so  classified  in  the  model  with  recalibrated  item 


Table  1.  Comparison  of  item  parameter  estimates  for  the  magnitude  response  component, 
based  on  tests  of  the  same  subjects  on  two  different  occasions  (N=59). 
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parameters.  When  the  item  parameters  are  reestimated,  a  few  more 
subjects  appear  to  have  mastered  the  component  and  a  few  more  appear  to 
fall  into  the  "almost  always  add"  error  pattern.  Other  things  being 
equal,  response  patterns  in  the  random  state  tend  to  have  smaller 
marginal  likelihoods  than  patterns  typical  of  systematic  states 
represented  in  the  model.  Levine  and  Drasgow  (1980)  make  a  similar 
observation  in  connection  with  appropriateness  measurement:  in  latent 
trait  models,  the  conditional  likelihoods  of  response  patterns,  given 
the  maximum  likelihood  estimate  of  0  for  the  response  pattern,  tend  to 
increase  with  9.  This  explains  how  the  moderate  deviations  between  the 
item  parameters  on  the  two  occasions  lead  to  the  substantial 
goodness-of-f it  statistic  which  we  get. 

The  relationship  between  components.  The  rest  of  the  analyses  we 
shall  report  are  based  on  the  frequency  distributions  over  combinations 
of  states  in  component-by-component  crossclassif ication  tables.  Since 
the  states  are  not  directly  observable,  we  have  to  resort  to  indirect 
means  to  obtain  these  frequency  tabulations.  Rather  than  adding  one 
tally  to  the  appropriate  cell  for  each  individual,  we  apportion  the  one 
tally  for  each  subject  to  cells  on  the  basis  of  the  likelihoods  of 
respective  states,  given  the  response  pattern  for  each  individual.  To 
simplify  matters,  we  assume  that,  conditional  upon  the  response 
pattern,  the  states  on  the  respective  components  are  independent. 

Hence,  we  add  the  product  of  the  likelihoods  of  the  states  on  the 
components  comprising  each  cell  to  each  of  the  cells  in  the  table. 

This  yields  a  table  of  expected  frequencies,  which  are  usually 
fractions.  The  estimated  joint  distribution  of  subjects  over  states  on 
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the  s  1  gri  arid  magnitude  components  on  the  second  test  is  given  in  Table 

2. 


Insert  Table  2  about  here 


The  phi -coefficients  between  the  mastery/non-mastery  dichotomies 
on  the  sign  and  magnitude  components  are  significantly  greater  than 
zero  on  all  tests,  falling  in  range  .48  to  .62.  Thus,  the  simplest 
model  for  the  joint  distribution,  which  assumes  that  classifications  on 
the  two  components  are  independent,  fails  on  every  testing.  A  simple 
model  which  takes  the  association  between  mastery  on  the  two  components 
into  account,  but  implies  conditional  independence,  given  that  one 
component  or  the  other  has  not  been  mastered,  can  be  specified  as 
follows.  Let  u,j  be  the  joint  probability  of  a  subject  being  in 
state  i  on  the  sign  component  and  state  j  on  the  magnitude  component. 
Let  it,  _  and  nj  denote  the  corresponding  marginal  probabi  lities  of 
states  on  the  components,  and  let  X  denote  the  covariance  between  the 
two  mastery/non-mastery  dichotomies.  Then  the  simple  dependence  model 
is  given  by 
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Test  of  goodness-of -f 1 t  of  simple  dependence  model: 
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In  this  notation,  state  1  is  the  mastery  state  on  both  components. 

Yamamoto  (1983)  showed  that  this  simple  dependence  model  fits  the 
data  from  the  first  test  quite  well.  In  fact,  a  restricted  form  of  the 
model  in  which  mastery  of  the  magnitude  component  implies  mastery  of 
the  sign  component,  gives  a  satisfactory  account  of  the  data. 

The  data  in  Table  2  show  that  the  restricted  form  of  the  model  for 
dependence  breaks  down  on  the  retention  test  (Test  2),  because  several 
subjects  who  appear  to  have  mastered  the  magnitude  component  have  not 
mastered  the  sign  component.  Nevertheless,  the  general  form  of  the 
model  fits  the  data  very  well. 

The  simple  dependence  model  does  a  pretty  good  job  of  accounting 
for  the  data  on  the  third  and  fourth  tests  also.  Only  on  the  fourth 

test  does  it  show  any  sign  of  breaking  down.  Two  subjects  on  that  test 

form  a  class  by  themselves  on  both  components:  on  the  sign  component, 

they  tend  to  take  the  sign  of  the  second  addend  as  the  sign  of  the  sum; 

on  the  magnitude  component,  they  tend  to  subtract  when  the  sign  of  the 
second  addend  is  negative  and  add  otherwise.  As  was  indicated  above, 
most  subjects  have  mastered  both  components  by  the  fourth  test.  Only 
11  cells  in  the  5x6  contingency  table  have  expected  frequencies  greater 
then  1  under  the  simple  dependence  model,  so  the  aopropri ateness  of  the 
goodness-of-f it  test  is  subject  to  question.  The  distribution  of  the 
other  76  of  the  78  subjects  who  took  the  test  was  in  good  accord  with 
the  simple  dependence  model.  Under  most  circumstances  a  significant 
test  statistic  which  is  entirely  due  to  2  observations  falling  in  a 
cell  with  very  small  expected  frequency  should  be  viewed  with 
skepticism.  Certainly,  the  model  represents  most  of  the  data  well.  In 
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this  case,  however,  the  "outliers"  make  good  psychological  sense  and 
serve  to  demonstrate  how  the  model  would  be  likely  to  break  down  in 
practice.  It  would  be  a  mistake  to  ignore  them. 

Transitions  between  states.  Examination  of  the  joint  distribution 
of  subjects'  states  on  the  magnitude  component  on  the  first  two  tests, 
given  in  Table  3,  shows  that  approximately  three-fourths  of  the 
subjects  either  stayed  in  the  state  they  were  in  on  the  first  test  or 
moved  to  the  random  state.  This  pattern  applies  to  transitions  from 
the  mastery  state  on  the  first  test  and  to  the  transitions  from  all  but 
one  of  the  systematic  error  states  as  well.  The  same  tendency  appears 
in  the  transitions  from  state  to  state  between  the  second  and  third 
test  and  between  the  third  and  fourth  test,  except  that  transitions  to 
the  mastery  state  become  the  most  common  transition  from  every  state 
after  classroom  instruction  begins. 

Insert  Table  3  about  here 

These  results  have  a  certain  verismi 1 itude  in  the  context  of  the 
latent  class  model.  The  fact  that  many  subjects  have  similar  response 
patterns  on  both  tests  lends  credibility  to  our  qualitative 
interpretations  of  these  response  patterns  as  states.  The  fact  that 
many  other  subjects  shift  from  systematic  responding  to  more  or  less 
random  responding  after  a  period  of  no  instruction  on  signed-numbers 
would  probably  not  come  as  a  surprise  to  their  teachers.  While  these 
results  make  sense  in  terms  of  the  latent  class  model,  it  might  be 


Table  3.  Joint  distribution  of  subjects'  states  on  the  magnitude  response  component  on 
the  first  and  second  tests.  Estimates  of  the  transition  probabilities  from 
states  on  the  first  test  to  states  on  the  second  test  are  given  in 
parentheses  below  the  frequencies  in  the  cross-tabulation. 
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noted  that  they  would  not  be  well  represented  by  the  statistical  models 
usually  enployed  in  assessing  change.  The  latter  models  are  implicitly 
or  explicitly  unidimensional.  If  individual  differences  in  the  amount 
of  change  are  allowed  for  at  all,  they  are  regarded  as  random  effects. 
At  least  in  the  present  instance,  latent  class  models  provide  a  richer, 
and  apparently  more  valid,  representation  of  the  changes. 

Summary 

This  project  has  developed  the  general  latent  class  model  as  a 
framework  for  representation  of  item  responses.  This  framework  can  be 
used  to  represent  data  in  applications  such  as  mastery  tests  and  other 
kinds  of  achievement  tests,  where  there  is  reason  to  believe  that 
current  foundations  are  deficient.  Methods  of  estimation  for  the 
latent  class  model  have  been  improved  and  hypothesis  tests  addressing 
issues  important  in  developing  specific  models  for  test  data  have  been 
devised . 

These  hypothesis  tests  include  a  test  for  monotone  homogeneity  of 
items,  tests  of  invariance  of  item  parameters  between  groups  and  over 
time,  a  test  for  the  significance  of  inclusion  of  a  new  state  in  a 
model,  and  other  tests.  A  nonparametric  approach  to  maximum  likelihood 
estimation  of  item  response  functions  for  monotonely  homogeneous  sets 
of  items  has  been  devised.  It  is  easy  to  generate  these  tests  in 
principle,  because  of  the  ease  of  dealing  with  various  specifications 
of  fixed,  equality,  complementan ty,  and  monotone  homogeneity 
constraints  in  the  estimation  algorithm. 


-  21 


The  use  of  this  general  approach  has  been  illustrated  by 
developing  models  which  successfully  represent  signed-number  addition 
test  data  gathered  by  Tatsuoka  and  Birenbaum  (1979).  These  models  are 
noteworthy  because  Tatsuoka  and  Birenbaum  have  shown  (and  our  new 
monotone  homogeneity  test  has  confirmed)  that  this  data  cannot  in 
principle  be  represented  by  a  unidimensional  model.  A  number  of 
techincal  issues  relating  to  these  models  are  discussed. 
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